5 research outputs found

    Mach number and wall thermal boundary condition effects on near-wall compressible turbulence

    Full text link
    We investigate the effects of thermal boundary conditions and Mach number on turbulence close to walls. In particular, we study the near-wall asymptotic behavior for adiabatic and pseudo-adiabatic walls, and compare to the asymptotic behavior recently found near isothermal cold walls (Baranwal et al. (2022)). This is done by analyzing a new large database of highly-resolved direct numerical simulations of turbulent channels with different wall thermal conditions and centerline Mach numbers. We observe that the asymptotic power-law behavior of Reynolds stresses as well as heat fluxes does change with both centerline Mach number and thermal-condition at the wall. Power-law exponents transition from their analytical expansion for solenoidal fields to those for non-solenoidal field as the Mach number is increased, though this transition is found to be dependent on the thermal boundary conditions. The correlation coefficients between velocity and temperature are also found to be affected by these factors. Consistent with recent proposals on universal behavior of compressible turbulence, we find that dilatation at the wall is the key scaling parameter for this power-law exponents providing a universal functional law which can provide a basis for general models of near-wall behavior.Comment: 24 pages, 15 figures, Under consideration for publication in Journal of Fluid Mechanic

    GenPIP: In-Memory Acceleration of Genome Analysis via Tight Integration of Basecalling and Read Mapping

    Full text link
    Nanopore sequencing is a widely-used high-throughput genome sequencing technology that can sequence long fragments of a genome into raw electrical signals at low cost. Nanopore sequencing requires two computationally-costly processing steps for accurate downstream genome analysis. The first step, basecalling, translates the raw electrical signals into nucleotide bases (i.e., A, C, G, T). The second step, read mapping, finds the correct location of a read in a reference genome. In existing genome analysis pipelines, basecalling and read mapping are executed separately. We observe in this work that such separate execution of the two most time-consuming steps inherently leads to (1) significant data movement and (2) redundant computations on the data, slowing down the genome analysis pipeline. This paper proposes GenPIP, an in-memory genome analysis accelerator that tightly integrates basecalling and read mapping. GenPIP improves the performance of the genome analysis pipeline with two key mechanisms: (1) in-memory fine-grained collaborative execution of the major genome analysis steps in parallel; (2) a new technique for early-rejection of low-quality and unmapped reads to timely stop the execution of genome analysis for such reads, reducing inefficient computation. Our experiments show that, for the execution of the genome analysis pipeline, GenPIP provides 41.6X (8.4X) speedup and 32.8X (20.8X) energy savings with negligible accuracy loss compared to the state-of-the-art software genome analysis tools executed on a state-of-the-art CPU (GPU). Compared to a design that combines state-of-the-art in-memory basecalling and read mapping accelerators, GenPIP provides 1.39X speedup and 1.37X energy savings.Comment: 17 pages, 13 figure

    Optimizing GPU Convnets

    No full text
    Convolution layers are useful for improving the accuracy of neural networks. In the case of networks like CosmoFlow with multiple consecutive convolution layers, the runtime for convolution layers dominates the end-to-end runtime. Several convolution algorithms, such as implicit GEMM, Fast Fourier transform, and Winograd have been optimized for different platforms. To achieve performance close to theoretical bounds, oftentimes manual fine-tuning is required which is specific to the target architecture. We use the DaCe framework to develop portable optimizations for 3D convolutions for implicit GEMM and direct convolution algorithms for the GPUs. We benchmark the optimized code against the available manually tuned library implementations
    corecore